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Abstract 

We consider various shuffling and unshuffling operations on languages 
and words, and examine their closure properties. Although the main goal is 
to provide some good and novel exercises and examples for undergraduate 
formal language theory classes, we also provide some new results and some 
open problems. 

1 Introduction 

Two kinds of shuffles are commonly studied: perfect shuffle and ordinary shuffle. 
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For two words x = oiai ■■■a„, y = •■■b„ of the same length, we de- 
fine their perfect shuffle xmy = ai&iai&i • ••cinbn- For example, term ni hoes = 
theorems. Note that xmy need not equal ymx. This definition is extended to 
languages as follows: 

Li mL2 = \^ {xmy}. 

xeLi , yGL2 
1-11=13-1 

If denotes the reverse of x, then note that (x m y)'^ = y'^ mx'^. 

It is sometimes useful to allow \y\ = + 1, where x = ai ■ ■ ■ an,y = bi ■ ■ ■ b„+i, 
in which case we define xmy = aibi- --anbnbn+i. 

The ordinary shuffle jc III y of two words is a finite set, the set of words ob- 
tainable from merging the words x and y from left to right, but choosing the next 
symbol arbitrarily from x or y. More formally, 

X III3; = {z : z = xiyiX2y2 ■ ■ ■ x„y„ for some n> I and 

words xi, . . . ,x„,yi, . . . ,y„ such that x = Xf ■ - Xn andy = yi • • 

This definition is symmetric, and jcinj = 3; III jc. The definition is extended to 
languages as follows: 

LimL2= [j (xUIy). 

x€Li, y€L2 

(As a mnemonic, the symbol III is larger than m in size, and similarly III 
generally produces a set larger in cardinality than in .) 

As is well-known, the shuffle (resp., perfect shuffle) of two regular languages 
is regular, and the shuffle (resp., perfect shuffle) of a context-free language with 
a regular language is context-free. Perhaps the easiest way to see all these results 
is by using morphisms and inverse morphisms, and relying on the known closure 
properties of these transformations, as follows: 

If L\,L2 Q £*, create a new alphabet S' by putting primes on all the letters 
of 2. Define hi(a) = /i2(a') = a and /ii(a') = h2(a) = e for a e S. Define 
h(a) = h(a') = a for a e S. Then 

LiUIL2 = h(hi\Li) n h2\L2)). 

In a similar way, 

L,mL2 = h{h\\L,) n h-^\L2) 0(22:')*). 

However, the shuffle (resp., perfect shuffle) of two context-free languages need 
not be context-free. For example, if Li = [a'^b'" : m > 1 } and L2 = {c'^d'^ : n > 
1}, then L : = Li m L2 is not a CFL. If it were, then L n a+c^b^d^ = {a'"c''b"'d" : 
m,n> 1} would be a CFL, which it isn't (via the pumping lemma). 
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Similarly, if L3 = {a"'b^"' : m > 1} and L4 = {a^"b" : n > I}, then 
L3 m L4 = {a2"(Z7a)"Z72« : n > 1}, which is clearly not a CFL. 
For these, and other facts, see [1]. 



Instead of shuffling languages together, we can take a language and shuffle (resp., 
perfect shuffle) each word with itself. Another variation is to shuffle each word 
with its reverse. This gives four different transformations on languages, which we 
call self-shuffles: 



We would like to understand how these transformations affect regular and 
context-free languages. We obtain some results, but other questions are still open. 

Theorem 1. IfL is regular, then ss(L) need not be context-free. 

Proof. We show that ss({0, 1}*) is not a CFL. Suppose it is, and consider L = 
ss({0, 1}*) n R, where R = {01«0^+i P+iQ^l : a,b,c,d>l}. Since R is regular, it 
sufflces to show that L' is not context-free. 

Now consider an arbitrary word w e L' . Then w = 01"0*^H''^'0"'l for some 
a,b,c,d > 1, and there exists ay e {0,\}* such that w e yUly. The structure of w 
allows us to determine y. Let yi and >'2 be copies of y such that w e >'i HI y2, and 
the first letter of w is taken from yi . 

The first symbol of y is evidently 0. It follows that the prefix 01" of w is taken 
entirely from yi, since the is taken from yi by definition and the first symbol of 
y2 is 0. Therefore 01'' is a prefix of yi. 

It follows that y2 also contains 01° as a prefix, and since a > 1 this is only 
possible if the first of y2 is located in the 0*"^^ block of w. Otherwise, y2 would 
be a subsequence of O'^l andyi would have 01"0*"^4'^"^^ as a prefix (implying that 
yi ^ yi)- Furthermore, the second symbol of y2 being 1 implies that exactly one 
of the O's in the O'''^^ block is from j2- Thus the rest are from yi and 01"0^ is a 
prefix of ji. 



2 Self-shuffles 



pssr(L) 



pss(L) 



ssr(L) 



ss(L) 
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Note that yi and y2 both end in 1, and w ends in O^'l. By the same logic as 
before, we can conclude that 0''l is a suffix of exactly one of them, and that the 
other ends in the block. Thus y2 contains O'^l as a suffix and yi ends in the 
F"^^ block (otherwise, yi yj). 

Finally, since the second last symbol of yi is and yi ends in the block, 
we can conclude that yi contains exactly one 1 from the F"^^ block and that yi = 
Ol^O^l. Unshufflingyi from w yields ^2 = OFO^l. 

Recall that yi = y^. So, 

yi = oroH = oro^i = y^ 

and since a,b,c,d> 1 we know that 

a = c and b = d. 



If w e L' then 

w = Ol^O^+U^^iQ^l 

= 01°0^(01)1°0^1. 

Since w was arbitrary, we have 

L' = {OrO^+4'+'0''l ■.a = c,b = d,andia,d>\} 
= {OrO'"(01)rO'"l ■.m,n> 1}, 

which is clearly not a CFL, using the pumping lemma. □ 

Remark 2. In a previous version of this paper, proving that ss({0, 1 }*) is not context- 
free was listed as an open problem. After this was solved by D. Henshall, a solu- 
tion was given by Georg Zetzsche independently. 

Similarly, we can show 

Theorem 3. L = Uwe{o,i|*(^ III w III w) is not context-free. 

Proof. We use Ogden's lemma. Consider 

L = {wmwmw : we {0,1}*} n 0*10*10*1. 

Pick s = 0"10"10"1 in L to pump. Write s = uvxyz and mark the middle 
block of O's. If V begins in the middle block of O's, then pump up to obtain 
s' = 0"10^10*^1, where n < j and n < k. We can't have s' G w III w III w because 
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the first w (the one ending at the first 1) is too short. If v begins in the first 
block of O's, then y occurs in the middle block, so now pump down to obtain 
s' = 0'10^10"1, where i < n and j < n. Again, we can't have s' 6 w III w III w, 
because the third w (the one ending at the third 1) must contain all of the O's 
immediately preceding the final 1, and hence is too long. □ 



Clearly ss({0, 1}*) is in NP, since given a word w we can guess x and check 
that w e X III X. However, we do not know whether we can solve membership for 
ss({0, 1}*) in polynomial time. This question is apparently originally due to Jeff 
Erickson [2], and we learned about it from Erik Demaine. 



Open Problem 4. Is ss({0, 1}*) in P? 



We mention a few related problems. Mansfield [4] showed that, given words 
w, X, y, one can decide in polynomial time if w e x III y. Later, the same au- 
thor [5] and, independently, Warmuth and Haussler [6] showed that, given words 
w, xi,X2, ■ ■ ■ , Xn, deciding if w e xi III ^2 III • • • III x„ is NP-complete. However, 
the decision problem implied by Open Problem 4 asks something different: given 
w, does there exist x such that w G x III x? 



Open Problem 5. Determine a simple closed form for 



Ukin) := 



y (xHIx) 



;c€{0,l,...,*:-l)" 



The first few terms are given as follows: 



n 





1 


2 


3 


4 


5 


6 


7 


8 


9 


aiin) 


1 


2 


6 


22 


82 


320 


1268 


5102 


20632 


83972 


a^in) 


1 


3 


15 


93 


621 


4425 


32703 


248901 






a^{n) 


1 


4 


28 


244 


2332 


23848 


254416 










1 


5 


45 


505 


6265 


83225 












1 


6 


66 


906 


13806 


225336 











Clearly a,(0) = 1, a;(l) = i, and ai{2) = If - i. Empirically we have a,(3) = 
5/3-5/2 + /, a,(4) = 14/^ -21/3 + 5/2 + 3/, and a,(5) =42/5-84/^ + 32/3+21/2-10/. 

This suggests that a,(n) = ^/" - ( j/""^ + 0(i"~^), but we do not have a proof. 
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3 Perfect self-shuffle 



We can consider the same question for perfect shuffle. We define 

pss(L) = [^{xnijc}. 

Theorem 6. Both the class of regular languages and the class of context-free 
languages are closed under pss. 

Proof. Use the fact that pss(L) = h{L), where h is the morphism mapping a ^ aa 
for each letter a. □ 



4 Self-shuffle with reverse 

We now characterize those words y that can be written as a shuffle of a word with 
its reverse; that is, as a member of the set x III x^. 

An abelian square is a word of the form xx' where x' is a permutation of x. 

Theorem 7. (a) If there exists x such that y & xlUxf^, then y is an abelian square, 
(b) Ify is a binary abelian square, then there exists x such that y G x III x'^. 

We introduce the following notation: if w = aia2 •••a„, then by w[i..f\ we 
mean the factor a,a,+i •••aj. 

Proof, (a) If y is the shuffle of x with its reverse, then the first half of y must 
contain some prefix of x, say x[l..fc]. Then the second half of y must contain the 
remaining suffix of x, say x[k + l..n]. Then the second half of y must contain, 
in the remaining positions, some prefix of x, reversed. But by counting we see 
that this prefix must be x[l..fc]. So the first half of y must contain the remaining 
symbols of x, reversed. This shows that the first half of y is just x[l..^] shuffled 
withx[k+ l..n]^, and the second half of )^ is just x[k+ l..n] shuffled with x[l..A:]^. 

So the second half of j is a permutation of the first half of y. 

(b) It remains to see that every binary abelian square can be obtained in this 
way. 

To see this, note that if x contains j O's and n - j I's, then we can get y by 
shuffling 0^1""^ with its reverse. We get the O's in x by choosing them from 0^1""^, 
and we get the 1 's in x by choosing them from (0^1"~^)^. □ 

Remark 8. The word 012012 is an example of a ternary abelian square that cannot 
be written as an element of w III vi^ for any word w. 
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Remark 9. The preceding proof gives another proof of the classic identity 



2n 
n 



+ 



+ 



To see this, we use the following bijections: the binary words of length 2n having 
exactly n O's (and hence n I's) are in one-one correspondence with the abelian 
squares of length 2n, as follows: take such a word and complement the last n bits. 
Thus there are binary abelian squares of length 2n. 

On the other hand, there are words that are abelian squares and have a first 
and last half, each with / O's. Summing this from / = to n gives the result. 

Corollary 10. The language 

ssr({0, 1}*)= [j xUIx^ 

is not a CFL, but is in P. 

Proof. From above, intersecting ssr({0, 1}*) with 0"^1"^0"^1"^ gives 

{0'"rO'"+^H" : m,n> land k>0} U {0'"l"+2*o'»i« ; m, n > 1 and > 0}. 

Now the pumping lemma applied to z = 0"1"0"1" shows this is not a CFL. 

Since we can easily test if a string is an abelian square by counting the number 
of O's in the first half, and comparing it to the number of O's in the second half, it 
follows that ssr({0, 1}*) is in P. □ 



As before, we can define 



bk(n) := 



;c€{0,l,...,*:-l)'' 



For k = 2, our results above explain bk(n), but we do not know a closed form for 
larger k. 

The first few terms are given as follows: 



n 





1 


2 


3 


4 


5 


6 


7 


8 


9 


biin) 


1 


2 


6 


20 


70 


252 


924 


3432 


12870 


48620 


b3(n) 


1 


3 


15 


87 


549 


3657 


25317 


180459 






b^{n) 


1 


4 


28 


232 


2116 


20560 


208912 








b,(n) 


1 


5 


45 


485 


5785 


73785 










h(n) 


1 


6 


66 


876 


12906 


203676 
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Clearly &,(0) = 1, &,(1) = i, and &,(2) = 2f- - i. Empirically, we have &;(3) = 
5/3-6/2+2/, &,(4) = 14/4-27/3 + 17/2-3/, and Z?,(5) = 42/5-110/^+94/3-17/2-8/. 



5 Perfect self-shuffle with reverse 

We now consider the operation w ^ w m applied to languages. Recall that 
pssr(L) = [J^^i^ixm?^}. 

Theorem 11. IfL is regular then pssr(L) is not necessarily regular. 

Proof. LetL = 0+10+. Then pssr(L) n 0+110+ = {OniO" : n > 1}, which is 
clearly not regular. □ 

Theorem 12. IfL is context-free then pssr(L) is not necessarily context-free. 

Proof LetL = {0'n'"2"3« : m,n>\}. Then pssr(L) n (03)+(12)+(21)+(30)+ = 
{(03)"(12)"(21)"(30)" : n> 1}, and this language is easily seen to be non-context- 
free. □ 

Theorem 13. IfL is regular then pssr(L) is necessarily context-free. 

We defer the proof of Theorem 13 until Section 6.4 below. 



Given a finite word w = aiaa • • • a„ we can decimate it into its odd- and even- 
indexed parts, as follows: 

oddCw) = aia^ - ■ ■ a„_((„+i) mod 2) 
even(w) = a2a4 • • • mod 2) 

Similarly, given w = aia2 • • • a„ we can extract its first and last halves, as follows: 



This suggests that bi(n) = - 
a proof. 



((?-/) - ^""^ + but we do not have 



6 Unshuffling 



fh(w) = flia2---flLn/2j 
lh(w) = ai„/2]+i ■■■an 



We now turn our attention to four "unshuffling" operations: 



bd(w) 
bdr(w) 
bdi(w) 
bdir(w) 



odd(>v)even(w) 
odd(w)even(w)^ 
fh(w) m lh(w) 
fh(w)inlh(wf 
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6.1 Binary decimation 

We first consider a kind of binary decimation, which forms a sort of inverse to 
perfect shuffle. 

Given a word w = aia2 • • • a2„ of even length, note that 

bd(w) = aia^ • ■ • a2n-lCl2Cl4 • • • Cl2n 

is formed by "unshufiiing" the word into its odd- and even-indexed letters. For 
example, the French word maigre becomes the word mirage under this opera- 
tion. 

Theorem 14. Neither the class of regular languages nor the class of context-free 
languages is closed under bd. 

Proof. Consider the regular (and context-free) language L = (00 + 11)"^. Then 
bd(L) = {ww : w G {0, 1}"^}, which is well-known to be non-context-free. □ 

6.2 Binary decimation with reverse 

We now consider the operation bdr, which is a kind of binary decimation with 
reverse. Note that 

bdr(aia2 • • • flin) = aia^ ■ • ■ a2n-\ci2n ' ' ' cina2. 

For example, bdr(friend) = finder and bdr(perverse) = preserve. 

Theorem 15. The class of regular languages is not closed under bdr. 

Proof LetL = (00)+ll. Then bdr(L) = {OniO" : n > 1}, which is not regular. 

□ 

Theorem 16. The class of context free languages is not closed under bdr. 

Proof Consider L = {(03r(12)" : n > 1}. Then bdr(L) = {0"r2"3" : n > 1}, 
which is not context-free. □ 

Theorem 17. IfL is regular, then bdr(L) is contextfree. 

Proof We show how to accept words of bdr(L) of even length; words of odd 
length can be treated similarly. 

On input w = b\b2 • • • &2n> a PDA can guess x = a\a2 • • • a2„ in parallel with 
the elements of the input. At each stage the PDA compares a, to &(,+i)/2 if i is odd; 
and otherwise it pushes a, onto the stack (if / is even). At some point the PDA 
nondeterministically guesses that it has seen a2n and pushed it on the stack; it now 
pops the stack (which is holding a2n • • • ciAai) and compares the stack contents to 
the rest of the input w. 

The PDA accepts if x G L and the symbols matched as described. □ 
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6.3 Inverse decimation 



We now consider a kind of inverse decimation, which shuffles the first and last 
halves of a word. 

Note that ifw = a\- ■ ■a2n is of even length, then 

bdi(w) = aia„+ia2a„+2 • • • a„a2n. 

Further, bdi(bd(w)) = bd(bdi(w)) for w of even length. 

Theorem 18. IfL is regular then so is bdi(L). 

Proof. On input x we simulate the DFA for L on the odd-indexed letters of x, 
starting from qo, and we simulate a second copy of the DFA for L on the even- 
indexed letters, starting at some guessed state q. Finally, we check to see that our 
guess of q was correct. □ 

Theorem 19. The class of context-free languages is not closed under bdi. 

Proof Let L = {0'"1'"22"3^" : m,n > 1}. It is easy to see that 



bdi(L) = 



(01) m-3n(02)2''(03)"(13)3", if m > 3n; 

(02) '«-"(03)"(13)'"(23)^^"-"', ifn<m<3n; 

(03) '"(13)'"(23)2«(33r-'", if m<n. 



Consider L' := bdi(L) n (03)+(13)+(23)+. From the above we have L' = 
{(03)"(13)"(23)2" : n>l}, which is evidently not context-free. □ 



6.4 Inverse decimation with reverse 

Note that if w = ai • • • is of even length, then bdir(w) = aia2n^J2«2n-i • • • 
If w = ai • • • a2n+\ is of odd length, we define 

bdir(w) = aia2„+ia2a2n ■ ■ ■ a„an+2an+i- 
Theorem 20. IfL is regular then so is bdir(L). 

Proof. On input x we simulate the DFA M for L on the odd-indexed letters of x, 
starting from qo. We also create an NFA M' accepting L'^ in the usual manner, by 
reversing the transitions of M, and making the start state the set of final states of 
M, and we simulate M' on the even-indexed letters of x. Finally, we check to see 
that we meet in the middle. □ 

Theorem 21. The class of context-free languages is not closed under bdir. 
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Proof. Consider L = {^'^\'^"^T'yi . m,n>l}. Then L is a CFL, and it is easy to 
verify that 

(03)"(02)"(01)2'"-2"(1 1 )'"+", if m > n; 
(03)"(02)2'"-"(12)2«-2'»(ll)3'«-'',if m <n<2m; 
(03)2'«(13)"-2'«(12)"(ll)3'"-",if 2m <n<3m; 
(03)2'"(13)"-2'"(12)6'"-"(22)"-3'",if 3m < n < 6m; 
(03)2'"(13)^'"(23)"-^'"(22)3'",if n > 6m. 



bdir(0^'"l^'"2"3") 



Assume bdir(L) is a CFL. Then L' := bdir(L) n (03)+(13)+(22)+ is a CFL, and 
from above we have U = {(03)2'"(13)^'"(22)3'" : m > 1}, which is not a CFL. □ 

As Georg Zetzsche has kindly pointed out to us, the operation bdir was studied 
previously by Jantzen and Petersen [3]; they called it "twist". They proved our 
Theorems 20 and 21. 

We now return to the proof of Theorem 13, which was postponed until now. 
We need two lemmas: 

Lemma 22. Suppose L is a regular language. Then L' = {\vv\^ : w £ L} is a 
CFL. 

Proof. On input x, a PDA can guess w and verify it is in L, while pushing it on 
the stack. Nondeterministically it then guesses it is at the end of w and pops the 
stack, comparing to the input. □ 

Lemma 23. For all words w we have w m = bdir(w) bdir(w)^. 

Proof. If w is of even length then 

wmw^ = (fh(w)lh(w)) m (fh(w)lh(w)f 
= (fh(w)lh(w)) m (\h(wffh{wf) 
= (fh(w)inlh(wf)(lh(w)mfh(w)'^) 
= bdir(w)bdir(wf . 

A similar proof works for w of odd length. □ 

We can now prove Theorem 13. 
Proof. From Lemma 23 we have 

pssr(L) = IJ jc m = IJ bdir(x) bdir(;c)^ = |J xx^. 

x€L x€L ;cebdir(L) 

If L is regular, then bdir(L) is regular, by Theorem 20. Then, from Lemma 22, it 
follows that pssr(L) is a CFL. □ 
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